Reinforced Multi-Teacher Selection for Knowledge Distillation

نویسندگان

چکیده

In natural language processing (NLP) tasks, slow inference speed and huge footprints in GPU usage remain the bottleneck of applying pre-trained deep models production. As a popular method for model compression, knowledge distillation transfers from one or multiple large (teacher) to small (student) model. When teacher are available distillation, state-of-the-art methods assign fixed weight whole distillation. Furthermore, most existing allocate an equal every this paper, we observe that, due complexity training examples differences student capability, learning differentially can lead better performance distilled. We systematically develop reinforced dynamically weights different instances optimize Our extensive experimental results on several NLP tasks clearly verify feasibility effectiveness our approach.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sequence-Level Knowledge Distillation

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neura...

متن کامل

Knowledge Distillation for Bilingual Dictionary Induction

Leveraging zero-shot learning to learn mapping functions between vector spaces of different languages is a promising approach to bilingual dictionary induction. However, methods using this approach have not yet achieved high accuracy on the task. In this paper, we propose a bridging approach, where our main contribution is a knowledge distillation training objective. As teachers, rich resource ...

متن کامل

Topic Distillation with Knowledge Agents

This is the second year that our group participates in TREC’s Web track. Our experiments focused on the Topic distillation task. Our main goal was to experiment with the Knowledge Agent (KA) technology [1], previously developed at our Lab, for this particular task. The knowledge agent approach was designed to enhance Web search results by utilizing domain knowledge. We first describe the generi...

متن کامل

A Fuzzy Approach For Multi-Objective Supplier Selection

Assessment and selection of suppliers are two most important tasks in the purchasing part in supply chain management. Supplier selection can be considered to be a single or multi-objective problem. From another point of view, it can be a single or multi-sourcing problem. In this paper, an integrated AHP and Fuzzy TOPSIS model is proposed to solve the supplier selection problem. This model makes...

متن کامل

Teacher Knowledge for Teaching Statistics through Investigations

This report compares the teacher knowledge of two early career primary school teachers (drawn from a study of four teachers) as it was needed in the classroom during the teaching of statistics through investigations. The study involved video recording a sequence of four or five lessons and audio recoding post-lesson stimulated recall interviews with the teachers. These interviews were based on ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i16.17680